Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

نویسندگان

  • Navid Rekabsaz
  • Mihai Lupu
  • Allan Hanbury
  • Andrés Duque
چکیده

We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art unsupervised system (CO-Graph). The results show that our approach outperforms both the standard baseline and the CO-Graph system in both of the task evaluation metrics (Out-Of-Five and Best result).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation

In this paper, we address the shortage of evaluation benchmarks on Persian (Farsi) language by creating and making available a new benchmark for English to Persian Cross Lingual Word Sense Disambiguation (CL-WSD). In creating the benchmark, we follow the format of the SemEval 2013 CL-WSD task, such that the introduced tools of the task can also be applied on the benchmark. In fact, the new benc...

متن کامل

Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources

Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged cor...

متن کامل

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...

متن کامل

Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD

Wordnet is an effective resource in natural language processing and information retrieval, especially for semantic processing and meaning related tasks. So far wordnet has been constructed in many languages. However, automatic development of wordnet for lowresource languages has not been studied well. In this paper an Expectation-Maximization algorithm is used to train high quality and large sc...

متن کامل

LIMSI : Cross-lingual Word Sense Disambiguation using Translation Sense Clustering

We describe the LIMSI system for the SemEval-2013 Cross-lingual Word Sense Disambiguation (CLWSD) task. Word senses are represented by means of translation clusters in different languages built by a cross-lingual Word Sense Induction (WSI) method. Our CLWSD classifier exploits the WSI output for selecting appropriate translations for target words in context. We present the design of the system ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.06196  شماره 

صفحات  -

تاریخ انتشار 2017